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Abstract 

The  problem  of  point  source  location  using  a  multi-beam  focal-plane  staring  ar¬ 
ray  radar  is  addressed.  It  is  viewed  as  one  in  functional  approximation  in  which 
the  position  of  the  source  is  regarded  as  a  nonlinear  function  of  the  sampled  radar 
image  and  it  is  required  to  construct  an  approximant,  using  a  training  set,  which 
minimises  the  mean  square  error  in  the  position  estimate.  The  problem  is  also  one 
of  generalisation,  since  the  expected  operating  conditions  are  likely  to  be  corrupted 
by  noise  and  this  must  be  taken  into  account  when  designing  the  approximant. 
Two  feed-forward  network  architectures  are  considered  -  a  particular  radial  basis 
function  network  which  arises  as  a  consequence  of  the  minimum  mean  square  error 
solution  and  is  appropriate  when  the  signal-to-noise  ratio  is  ‘small'  and  a  multi¬ 
layer  perceptron.  chosen  for  high  signal-to-noise  ratio  approximation.  The  errors 
in  the  position  estimates  for  each  of  these  approaches  are  compared  with  a  maxi¬ 
mum  likelihood  position  estimation  method.  The  maximum  likelihood  method  gives 
better  overall  performance  and  has  the  advantage  that  it  is  not  dependent  on  the 
signal-to-noise  ratio. 
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1  Introduction 


The  problem  addressed  in  this  paper  is  one  of  multi-sensor  data  analysis.  Data  generated 
by  a  given  sensor  system  represents  a  particular  view  of  the  scene  under  consideration.  The 
signal  processing  problem  is  to  provide  a  description  of  that  scene,  given  some  a  piiori 
knowledge  of  the  scene  characteristics  and  knowledge  of  the  properties  of  the  sensor.  For 
example,  in  an  active  radar  situation,  the  scene  may  be  comprised  of  point  scatterers,  dis¬ 
tributed  scatterers,  clutter,  chaff  and  interference.  One  common  form  of  a  priori  knowledge 
imposed  is  the  special  case  that  the  scene  can  be  represented  spatially  by  a  collection  of 
point  sources.  There  are  many  techniques  for  estimating  the  parameters  of  sources  in  a 
scene,  all  of  which  require  some  form  of  training  of  the  system.  This  may  be  achieved  by 
moving  a  single  source  around  in  the  far  field  and  recording  the  output  of  the  system.  If  the 
outputs  of  all  sensors  in  the  system  are  sampled  simultaneously,  then  we  obtain  a  vector 
of  numbers,  the  ‘image  vector’,  which  gives  a  snapshot  from  the  system  for  a  given  source 
position.  All  these  vectors  are  collected  together  as  columns  of  a  matrix  which  forms  a 
‘reference  library’  of  signals  expected  from  each  incident  direction.  This  library  lies  on  a 
two-dimensional  manifold,  termed  the  array  manifold  [18,  23],  within  the  space  of  sensor 
outputs. 

The  particular  sensor  system  we  consider  is  a  focal-plane  array  radar.  Focal-plane  array 
technology  provides  a  wide  multiple-beam  field  of  view  with  no  moving  parts  and  benefits 
from  a  high  level  of  front-end  circuit  integration  [l].  It  uses  a  lens  to  provide  multiple-beam 
coverage  over  a  wide  field  of  view,  and  a  planar  array  of  receivers,  with  no  requirement  for 
any  beam-forming  circuitry.  Furthermore,  the  individual  receivers  that  make  up  the  array 
are  very  small  and  they  can  be  designed  so  as  not  to  contain  any  microwave  circuitry. 
All  of  these  factors  combine  to  make  the  receiver  and  array  architecture  so  simple  that 
it  is  potentially  possible  to  implement  complete  an-ays  within  a  small  area  of  low-cost 
monolithic  silicon.  The  two  principal  components  of  a  focal-plane  receiver  front-end  are 
a  dielectric  lens  and  an  array  of  receivers.  The  lens  system  focuses  incoming  radiation  on 
to  the  antenna  array.  The  combined  operation  of  the  lens  and  receiver  array  provides  a 
multiplicity  of  beams,  each  with  its  own  direction  of  look.  The  radiation  pattern  of  each 
beam  depends  on  the  lens  aperture,  the  properties  of  the  lens,  the  responses  of  the  receivers 
and  their  poositions  on  the  focal  plane. 

One  of  the  main  purposes  of  this  paper  is  to  show  a  possible  use  of  adaptive  feed-forward 
networks  (or  ‘neural’  networks)  to  the  problem  of  point-source  location  using  radar  focal- 
plane  arrays.  Neural  networks  for  sensor  signal  processing  tasks  are  currently  an  area  of 
considerable  research  [4,  25].  One  particular  area  of  interest  is  that  of  automatic  target 
recognition  [21]  and  the  problems  which  have  been  addressed  to  date  apply  neural  network 
techniques  to  data  from  a  variety  of  sensor  outputs  including  radar  [2,  4,  6],  sonar  [8],  infra¬ 
red  and  laser  returns  and  these  techniques  have  been  used  to  identify  various  target  types 
such  as  ships,  aircraft,  munitions,  ground  vehicles  in  a  clutter  environment,  and  terrain 
types.  Other  signal  processing  problems  being  addressed  include  bearing  estimation  [9,  13], 
multitarget  tracking  [15]  and  radar  signal  categorisation  [20]. 

The  advantage  of  an  adaptive  network  solution  to  the  problem  of  point-source  location 
is  that  the  network  implicitly  allows  a  parametrisation  of  the  point-spread  function  ot  array 
manifold  (by  the  weights  in  the  network)  which  obviates  the  need  to  store  the  point -spread 
function  explicitly.  Also,  with  technology  currently  being  developed  there  is  the  potential 
for  an  integrated  solution  on  the  focal  plane  of  the  system.  There  are  other  techniques  which 
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may  be  used,  however.  A  maximum  likelihood  approach  to  position  estimation  has  been 
considered  in  [27]  and  indeed  a  neural  network  implementation  of  a  maximum  likelihood 
algorithm  is  described  in  [11].  The  approach  in  this  paper  differs  from  that  in  [ll]  in  that 
we  are  considering  feed-forward  architectures  and  we  consider  the  effects  of  noise  on  the 
estimates  in  position. 

The  specific  problem  we  shall  consider  in  this  report  is  the  estimation  of  the  position 
of  a  tingle  source  in  the  scene  given  its  sampled  image  vector.  For  illustration  purposes, 
we  shall  restrict  the  analysis  and  the  numerical  examples  in  this  paper  to  linear  arrays, 
though  it  applies  equally  to  two-dimensional  arrays.  Section  2  describes  the  generation  of 
an  image  vector  and  how  a  library  of  such  vectors  may  be  used  in  the  problem  of  point  source 
location.  In  Section  3  we  consider  the  problem  of  deriving  an  approximation  to  a  known 
functional  transform  which  generalises  to  points  not  in  the  data  set  and  which  approximates 
the  function  in  a  minimum  mean  square  error  sense.  Section  4  gives  a  brief  description  of 
adaptive  feed-forward  networks  and  methods  of  training  such  networks.  Section  5  considers 
the  application  of  the  network  to  point  source  location,  with  the  specific  example  of  an 
idealised  linear  array  of  receivers  in  the  focal-plane  of  an  imaging  system.  The  problem 
is  one  of  generalisation.  In  a  practical  situation,  the  array  outputs  (the  inputs  to  the 
feed-forward  network)  are  likely  to  be  corrupted  by  noise.  Therefore,  we  wish  to  design  a 
network,  based  on  the  training  data  characterising  the  array  manifold  (perhaps  generated 
from  a  model  of  the  imaging  process  or  obtained  during  some  calibration  of  the  system), 
which  generalises  from  the  noiseless  training  data  set  to  input  vectors  corrupted  by  noise. 
Two  types  of  network  are  considered.  One  is  a  particular  radial  basis  function  network  (see 
Section  4)  appropriate  when  the  expected  noise  “in  operation”  is  large  (a  low  signal-to- 
noise  ratio).  The  second  is  a  multilayer  perceptron  architecture  designed  for  high  signal- 
to-noise  ratios.  The  performance  of  these  networks  is  compared  to  a  maximum  likelihood 
approach.  Finally,  the  paper  concludes  with  a  discussion  of  the  results  and  a  summary  of 
the  advantages  and  disadvantages  of  the  use  of  a  network  for  point  source  location  and  gives 
some  suggestions  for  further  work. 


2  The  Imaging  Problem 


The  problem  of  point-source  location  may  be  posed  as  one  of  image  restoration  in  which 
we  desire  to  reconstruct  a  scene  from  a  set  of  measurements  of  the  image  of  the  scene,  given 
some  knowledge  of  the  imaging  operation.  This  knowledge  is  often  expressed  in  terms  of 
the  point-spread  function,  usually  specified  as  a  library  of  vectors.  This  library  of  vectors  is 
generated  from  the  outputs  of  an  array  of  N  sensors  in  the  focal  plane  of  an  imaging  system 
as  follows. 

The  one-dimensional  imaging  equation  relating  a  time- varying  image  p(i;  /)  to  the  scene, 
is  given  by  a  convolution  equation  of  the  form 

9(r;t)  =  j  h(x;  ()/({;  t)d(  +  n(*;t)  (1) 

where  h{z:  ()  is  the  point-spread  function  of  the  imaging  system  and  n(r;  t)  is  the  noise  in 
the  degraded  image. 

When  the  image  is  sampled,  the  image  is  known  at  only  a  finite  number  of  points  in 
the  image  plane  (ij,  ij, . . ..  r;v)  corresponding,  in  the  focal  plane  imaging  problem,  to  the 
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array  receiver  positions  and  Equation  (1)  becomes 

s.(0  =  /  h,(om;t)dt+n,(t) 


(2) 


where  <7,{()  is  the  value  of  the  sampled  image  at  position  z,  at  time  t  and  N  is  the  number 
of  image  sample  positions  (number  of  receivers).  The  function  h,( ()  is  the  response  at  the 
position  z,  to  a  point  source  in  the  far-field  as  a  function  of  position  of  that  source.  For  an 
ideal,  diffraction-limited,  space-invariant  imaging  system  (one  which  acts  uniformly  across 
image  and  object  planes)  (with  a  narrow  slit  as  the  aperture)  the  response  h,{()  is  given  by 


sin;n(z,  -  Q] 

*(*.  -  0 


(3) 


In  the  examples  of  Section  5,  we  take  fl  =  it,  so  that  sampling  at  the  Nyquist  rate, 
(ir/fl)  gives  unit  spacing  of  the  sample  points. 

If  noise  effects  are  absent  then  a  point  source  of  unit  amplitude  in  the  far  field  at  a 
position  ({o)  gives  rise  to  an  image  vector 

h((o)  =  (M{o),M«o) . Mfo))*  (4) 

where  *  denotes  vector  transpose.  The  library  of  vectors  used  to  characterise  the  imaging  op¬ 
eration  consists  of  a  set  of  P  such  images  of  sources  (h1,h2, . . .  ,hp)  at  P  different  positions 
in  the  scene  (these  images  lie  on  a  one-dimensional  manifold,  termed  the  array  manifold, 
in  the  A’ -dimensional  space  of  sensor  outputs)  together  with  the  set  of  corresponding  po¬ 
sitions  {4,.  i  =  1 . P}.  Thus  the  data  points  used  to  characterise  the  imaging  operation 

are  points  ).  i  =  1 . P}  in  the  space  RN  ®R.  In  the  terminology  of  feed-forward 

networks,  this  is  referred  to  as  the  training  set.  The  image  vectors,  h,.  will  be  complex¬ 
valued  in  general,  though  in  our  examples  we  shall  consider  the  ideal  responses  given  by 
Equation  (3)  which  gives  rise  to  real-valued  quantities. 

The  problem  of  image  restoration  is  to  reconstruct  an  object  from  its  band-limited  im¬ 
age,  given  knowledge  of  the  point-spread  function  and  some  a  priori  knowledge  concerning 
the  object.  The  particular  constraint  that  we  consider  here  is  that  the  scene  consists  of  a  sin¬ 
gle  point  source  (of  unknown  amplitude,  A)  and  the  problem  addressed  is  the  determination 
of  the  position  of  the  source  given  its  image  vector,  I.  which  is  of  the  form 

I  =  Ah  +  n, 

where  n  is  a  noise  vector.  Thus,  the  image  vector  is  a  scaled  (by  amplitude  A,  which  may 
be  a  complex  quantity)  and  corrupted  (by  additive  noise)  version  of  a  vector  h,  which  lies 
on  the  array  manifold,  but  which  is  not  necessarily  a  member  of  the  training  set. 


S  Minimum  Mean-square  Estimate 

Before  we  discuss  feed-forward  networks  in  Section  4  and  consider  in  particular  their  appli¬ 
cation  to  position  estimation  in  Section  5,  we  shall  present  some  functional  approximation 
preliminaries.  We  view  the  problem  of  point -source  location,  given  an  image  vector,  as  one 
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in  nonlinear  function  approximation  and  generalisation.  That  is,  we  regard  the  position  of 
the  source  in  the  scene  as  a  nonlinear  function  of  the  image,  with  the  form  of  the  nonlinear 
function  being  specified  by  the  training  data.  Also,  we  wish  to  generalise  to  data  points 
not  in  the  training  set.  This  was  the  problem  addressed  in  [26]  and  in  this  section  we  shall 
present  some  results  relating  to  the  approximation  of  a  function  /(*),  where  *  is  a  noiseless 
data  sample,  by  a  function  g(z),  where  z  is  a  data  sample  corrupted  by  additive  noise. 

Suppose  that  we  wish  to  approximate  a  transformation  /  from  R”  to  R”'.  Let  the 
approximation  be  given  by  g  which  is  chosen  so  that  the  quantity  V,  defined  by 

V  =  /  /  l/(*)  -  +  f  )|2p„(()p(*)d*df  (5) 

is  a  minimum,  where  Pn(()  is  the  probability  density  function  of  a  noise  distribution  in  the 
space  Rn  and  p(z)  defines  the  distribution  of  data  points  z  in  the  space  Rn.  Equation  (5) 
defines  the  expected  square  error  in  the  approximation  when  the  data  points  in  the  domain 
of  f  are  corrupted  by  additive  noise,  and  may  be  written  (for  z  =  *  +  £)  as 

V  =  f  J (/(*)  -  g(z))2pn(z  -  z)p(z)dzdz.  (6) 


Minimising  with  respect  to  the  function  g  gives  the  solution  for  g  as 

,  .  _  //(«)P»(*  ~  *)p(*)dz 
fp„(z  -  z)p(z)dz 


(7) 


This  is  the  approximation  to  the  function  /  for  which  the  expected  square  error  in  the 
functional  value,  integrated  over  the  domain  of  /,  is  a  minimum  and  generalises  f  to  points 
z  outside  the  distribution  of  the  data  points  *. 

More  generally,  the  minimum  mean  square  estimate  of  /  given  z  is  the  expected  vector 
of  the  a  posteriori  density  [7] 


»(i) 


£[/(*)!*!  =  J  /(*)p(*'*)d* 

_  J  f(*)p(*,*)p{*)dz 
J  p(z\z)p(z)dx 


(8) 


Note  that  the  function  g{z)  may  be  defined  over  the  whole  space  R",  whereas  the  data 
points  z  may  lie  on  a  reduced  dimension  manifold,  X ,  in  R"  (as  specified  by  the  probability 
density  function,  p(«)).  Thus,  the  approximation  to  /,  p(r ),  is  defined  for  values  of  z  which 
do  not  necessarily  lie  on  the  manifold,  X.  This  is  important  in  many  applications  in  which 
noise  will  corrupt  data  points,  *,  to  give  values  z  =  z  which  lie  outside  the  domain  of  /. 
In  these  situations  it  is  not  sufficient  to  interpolate  the  training  set  {(*,,/,).!  =  1....T’} 
without  due  regard  to  defining  the  mapping  for  points  outside  the  manifold. 

The  minimum  mean-square  approximation  derived  above  provides  a  biased  estimate. in 
that  for  a  data  point.  *0-  the  mean  of  the  estimate  (the  average  over  all  perturbations  {  to 
*0)  is  not  necessarily  equal  to  the  functional  value  /(*o),  ».e. 

j  9(*)p{z\Xo)<lz  ?  /(*0) 


(9) 
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where  z  =  x0  +  (.  In  some  practical  situations  it  may  be  advantageous  to  have  an  unbiased 
estimate  so  that  integration  may  be  performed  after  the  functional  transformation,  i.e.  we 
need  to  produce  an  approximation  g(z)  which  is  defined  for  all  noise  perturbations  and 
which,  for  inputs  z  =  *o  +  {,  if  averaged  will  tend  to  /(*oh  the  true  value  in  the  absence  of 
noise.  An  approach  for  finding  such  an  unbiased  approximation  using  Lagrange  multipliers 
is  given  in  [26]. 


S.l  Radial  Basis  Function  Approximation 

For  a  function  /  defined  by  a  finite  set  of  points  {(*<,/;),  i  =  1, . . . ,  P}  in  R."  ®  R“',  then 
provided  that  the  integrands  in  Equation  (7)  are  sufficiently  smooth,  the  solution  g  may  be 
approximated  by  g  given  by 


9{*) 


Ef=lPn(*  -  *.) 


»(*)  =  £/.  Pn(*  -  *•) 

t=l 

where  pn(z  -  *,)  is  defined  by 


Pn(*  -  *.) 


Pn(z  -  a,) 

aii  p»<z  -  *.) 


(10) 


(11) 


(12) 


Equation  ( 1 1 )  is  identical  in  form  to  radial  basis  function  approximations  [3]  in  that  the 
approximating  functional  is  a  linear  combination  of  (specified)  nonlinear  functions  of  the 
difference  between  a  data  point,  z  and  a  ‘centre’.  In  this  case  the  nonlinear  basis  functions 
are  determined  by  the  noise  probability  density  function,  the  centres  by  the  data  points 
and  the  weights  are  the  function  values,  /,  at  the  centres.  Thus  a  radial  basis  function 
network  structure  arises  as  a  natural  consequence  of  the  minimum  variance  solution.  For 
example,  for  a  Gaussian  noise  model  with  diagonal  covariance  matrix  with  equal  diagonal 
elements  crJ, 


«(*) 


E/Li«p[-s£tI*  -  *.!2] 


(13) 


Note  that  in  order  to  derive  the  function  g  which  approximates  f  and  generalises  to 
unseen  data,  we  have  not  assumed  a  specific  functional  form.  noT  a  smoothness  condition 
(as  in  a  regularisation  theory  approach).  We  have  assumed  that  we  know  how  to  perform 
the  mapping  if  there  were  no  noise  (noiseless  training  data)  and  assumed  a  minimum  mean 
square  error  measure.  A  consequence  of  this  is  the  radial  basis  function  nature  of  the 
solution.  However,  we  do  need  to  know  the  noise  distribution.  If  we  were  to  assume  that 
it  is  Gaussian  with  diagonal  covariance  matrix  with  equal  elements,  then  we  would  need  to 
specify  the  noise  variance,  o\  on  the  test  data. 

The  function  g  will  provide  a  good  approximation  to  the  exact  minimum  mean-square 
solution,  g.  if  the  standard  deviation  of  the  noise  is  large  compared  to  the  distance  between 
sample  points. 
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3.2  Perturbation  Analysis  for  High  Signal-to-Noise  Ratios 

The  solution  for  the  minimum  variance  approximation  to  a  known  function,  f  :  R"  —  R” 
is  given  by  Equation  (7).  When  the  functional  transformation  is  specified  only  by  points  in 
Rn®R"  ,  then  this  minimum  variance  solution  may  be  approximated  by  a  summation  which 
takes  the  form  of  a  radial  basis  function  network  with  nonlinear  functions  being  (normalised) 
noise  probability  density  functions.  This  summation  will  be  a  good  approximation  to  the 
minimum  variance  solution  provided  that  the  standard  deviation  of  the  noise  distribution 
is  large  compared  to  the  spacing  between  samples,  z,.  In  a  low  noise  situation  (where 
the  standard  deviation  of  the  noise  distribution  is  small  compared  to  the  distance  between 
sample  points),  the  approximation  g(z)  to  g(z)  will  be  accurate  only  in  the  region  of  the 
sample  points  and  at  intermediate  values  will  give  a  very  poor  approximation.  Therefore, 
we  need  to  specify  a  model  for  the  approximation  to  /(z),  or  a  constraint  in  the  form  of  a 
regularisation  term,  in  order  to  describe  how  the  function  varies  between  sample  points. 

Let  us  assume  that  we  have  a  parameterised  mode]  for  the  approximation  to  /.  In  the 
following  section,  we  shall  consider  a  specific  model  (namely  a  feed-forward  network),  but 
at  the  moment  there  is  no  restriction  to  its  form  other  than  it  is  a  continuous  function,*?,  of 
the  data  z  with  continuous  first  derivatives.  First  of  all  we  shall  calculate  the  perturbation 
to  the  error  between  the  actual  values,  /,  and  the  approximate  values  due  to  noise  on  the 
data  points. 

Let  {(z,./,.  t  =  l,...,/>}  denote  the  set  of  points  describing  the  mapping  f  :  Rn  — 
Rn  .  For  a  given  data  value.  zp,  let  Ep  =  E(zp)  be  the  erTor  between  the  approximation  to 
/(z)  and  the  desired  value.  fp  for  the  pth  pattern,  zp.  Often,  the  total  error,  is  given  by 

P  P 

et  =  ~z£p  =  4  z  (i4) 

1  P=1 


with  £(zp)  being  the  square  of  the  error  for  pattern,  zp.  between  the  desired  value  (termed 
the  'target'  values  in  a  feed-forward  network  framework),  and  the  approximation,  giving 
Et  as  the  sum-square  error  between  the  approximations  and  the  desired  values.  However, 
in  the  analysis  which  follows  we  impose  no  such  restriction. 

If  the  input  patterns  are  corrupted  by  noise,  i.e.  they  are  of  the  form  Zp  +  n.  where  the 
noise  vector  n  has  the  property  that  (nn‘)  =  cr2/,  (/  is  the  n  x  n  identity  matrix)  then  it 
is  shown  in  Appendix  B  that  the  expected  error  at  the  output,  {Et)  may  be  written 

(Et)  =  ±  £  £(«,)  +  ^  £  Tr(H’).  (15) 

?=i  r=i 

The  first  term  in  the  expression  is  the  error  in  the  approximation  when  there  is  no  noise  on 
the  data.  The  second  term  is  a  second  derivative  quantity  proportional  to  the  noise  variance 
<r2.  For  <r2  =  0,  {Et)  reduces  to  the  usual  error  term  in  the  absence  of  noise.  Thus,  if  we 
have  a  mapping  f  :  R"  — *  R"’  defined  by  points  in  R"  ®  R”  in  which  the  data  points  in 
R"  are  corrupted  by  additive  noise  with  zero  mean  and  variance  cr2  (sufficiently  small  so 
that  the  higher  order  terms  in  the  Taylor  expansion  may  be  neglected),  then  minimising  the 
error  over  all  patterns  and  over  the  noise  distribution  with  respect  to  the  parameters  of  the 
approximating  function,  g,  is  equivalent  to  minimising  a  modified  error  term  defined  on  the 
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patterns  in  the  absence  of  noise.  Equation  (15)  shows  that  the  effects  of  noise  on  the  test 
data  can  be  compensated  for  by  training  an  approximant  with  a  modified  error  criterion. 
A  different  approximation  to  f  can  be  derived  for  different  values  of  the  noise  variance,  <r2. 

The  two  terms  in  Equation  (15)  may  be  regarded  as  the  usual  error  metric  plus  a 
regularisation  or  stabilising  term  with  regularisation  parameter  <r2,  the  variance  of  the 
noise  on  the  inputs.  For  the  sum-squared  error  criterion,  the  second  term  in  Equation  (15) 
may  be  written  as 

^E(l!JPi|2-(/r-9!*r))V)  (16) 

p=i 

where  the  n'  x  n  matrix  Jp  is  the  Jacobian 


J 


p 

•j 


(17) 


representing  the  derivative  of  the  ith  component  of  the  approximation  with  respect  to  the 
jth  input,  evaluated  for  pattern  xp.  The  vector  qr=  (?£,  <?£....,  gj )‘  is  a  vector  of  second 
derivative  terms,  with  tth  component 


= 


^  dx2  \ 
.=1  ■  ;z. 


(18) 


evaluated  for  the  pth  pattern. 


3.3  Summary 

It  is  appropriate  at  this  stage  to  summarise  the  results  of  this  section. 

1.  Suppose  that  we  have  a  known  function,  fix)  which  we  wish  to  approximate.  In  the 
problem  considered  in  this  paper,  {z}  is  the  set  of  images  of  a  point  source  in  the 
absence  of  noise  and  /(*)  is  the  position  of  the  source. 

2.  Suppose  that  we  wish  to  approximate  in  a  least  squares  sense  the  function  f(x)  by  a 
function  g(z)  which  is  defined  for  points  z  outside  the  set  {*}.  For  example,  z  may¬ 
be  the  image  of  a  source  corrupted  by  additive  noise,  i.e.  z  =  x  +  n 


then 


•  the  solution  for  g(z)  is  given  by 


/  f(*)Pn(Z  -  X)p(x)dx 

J  Pn  (  Z  -  X)p(z)dx 


•  This  may  be  approximated  by  a  finite  sum 


g(z) 


LfL,  /■  P"(z  -  *■) 
LHiP"lz  -  *■} 


(19) 


(20) 
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provided  that  the  sample  spacing  of  the  points  z,  is  small  compared  to  the  standard 
deviation  of  the  noise  probability  density  function. 

•  If  this  is  not  so  (the  noise  is  small),  then  we  assume  a  particular  parametric  form 
for  g(z)  and  choose  the  parameters  which  minimise  an  augmented  sum-square  error 
measure.  This  is  equivalent  to  training  on  data  representative  of  the  operating  con¬ 
ditions.  That  is,  we  may  simulate  the  effects  of  noise  by  using  the  noiseless  data  with 
a  modified  error  criterion. 


4  Feed-forward  Adaptive  Networks. 


Connectionist  models  based  on  feed-forward  networks  (for  example,  multilayer  perceptrons 
(MLPs)  [22]  and  radial  basis  function  networks  [3]  (RBFs))  have  been  used  with  some 
success  when  operating  as  static  pattern  classifiers  on  a  wide  range  of  problems.  Such 
networks  perform  a  nonlinear  transformation  from  an  n-dimensional  input  space  to  the 
n'-dimensional  output  space  via  a  characterisation  space  defined  by  the  outputs  of  the 
(final  layer  of)  hidden  units  in  which  a  specific  feature  extraction  criterion  is  maximised 
[17,  29].  This  feature  extraction  criterion  may  be  viewed  as  a  nonlinear  multidimensional 
generalisation  of  Fisher's  linear  discriminant  function.  Training  the  network  for  a  pattern 
classification  task  consists  of  presenting  data  vectors  as  input,  together  with  class  labels 
at  the  output  of  the  network,  suitable  coded,  and  minimising  an  error  criterion.  For  a 
1-from-n'  target  coding  scheme,  and  the  usual  sum-square  etTor  criterion,  the  outputs  of 
a  trained  network  approximate  the  Bayes  discriminant  vector,  giving  the  probability  of  a 
class  given  the  input  to  the  network  [17;. 

An  alternative  viewpoint  to  the  pattern  classification  description  on  the  operation  of 
adaptive,  feed-forward  layered  networks  such  as  the  multilayer  perceptron  is  that  they  per¬ 
form  well  for  certain  tasks  by  exploiting  their  modelling  flexibility  to  create  an  implicit 
interpolation  surface  in  a  high-dimensional  space  [3,  16],  In  fact,  it  may  be  shown  that 
multilayer  feed-forward  networks  with  a  single  hidden  layer  are  universal  approximators 
in  that  an  arbitrary  function  can  be  approximated  arbitrarily  well  [10,  24].  However,  in  a 
practical  problem,  the  mapping  we  wish  to  approximate  is  not  known  continuously  but  it 
is  usually  defined  by  a  finite  set  of  points  in  R“  ®  R”  defined  by  a  training  set.  Specifi¬ 
cally,  in  mapping  a  finite  set  of  P ,  n  dimensional  ‘training’  patterns  to  the  corresponding 
n'  dimensional  ‘target’  patterns,  /  :  R"  — »  R"  one  may  think  of  this  map  as  being  gen¬ 
erated  by  a  ‘graph’  T  C  R“  ®  R"  .  The  input  and  target  pattern  pairs  are  points  on  this 
graph.  The  learning  phase  of  adaptive  network  training  corresponds  to  the  optimisation  of 
a  fitting  procedure  for  T  based  on  knowledge  of  the  data  points.  This  is  curve  fitting  in  the 
generally  high  dimensional  space  R”®Rn  .  Thus  generalisation  becomes  synonymous  with 
interpolation  along  the  constrained  surface  which  is  the  ‘best’  fit  to  T  [31]. 

If  there  is  noise  in  the  expected  operating  conditions  (on  the  test  set)  then  this  must 
be  taken  into  account  when  designing  a  fitting  surface  to  the  training  data  points.  This 
was  the  problem  addressed  in  [26]  in  which  it  was  shown  how  to  construct  a  fitting  surface 
which  gives  the  expected  value  of  the  observation  in  Rn  given  the  data  sample' in  R". 

The  problem  of  point  source  location  is  one  of  generalistaion  [26]  in  that  we  wish  to 
generalise  to  data  points  (scaled  and  corrupted  by  noise)  which  are  not  in  the  training 
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set.  In  this  section  we  give  a  brief  description  of  the  structures  we  shall  use  to  process  the 
outputs  of  a  focal-plane  array  radar. 


Input  layer  Hidden  layer  Output  layer 


9'<i  ^jk 


Figure  1:  A  schematic  diagram  of  the  standard  feed  forward  adaptive  layered  network 
geometry  considered  in  this  paper. 

The  structure  of  a  standard  layered  network  model  is  depicted  in  Figure  1.  It  is  envisaged 
that  input  data  may  be  represented  by  an  arbitrary  (real-valued)  n-dimensiona]  vector,  c, 
or  an  ordered  sequence  of  n  real-valued  numbers,  {*,;  i  =  l,...,n}.  Thus  there  are  n 
independent  input  nodes  to  the  network  which  accept  each  input  data  vector.  Each  input 
node  is  totally  connected  to  a  set  of  no  ‘hidden’  nodes  (hidden  from  direct  interaction  with 
the  environment).  Associated  with  each  link  between  the  t-th  input  node  and  the  j-th 
hidden  node  is  a  scalar  fi,}.  Usually,  the  fan-in  to  a  hidden  node  takes  the  form  of  a 
hyperplane:  the  input  to  node  j  is  of  the  form  9,  =  tin,,  =  where  ft}  is  the 
vector  of  n  scalar  values  associated  with  hidden  node  j  and  *  denotes  transpose.  The  role  of 
each  hidden  node  is  to  accept  the  value  provided  by  the  fan-in  and  output  a  value  obtained 
by  passing  it  through  a  (generally,  though  not  necessarily)  nonlinear  transfer  function, 

<t>i  -  #(Moj  +  0>)  =  +  **M,)  (21) 

where  poy  is  a  local  ‘bias’  associated  with  each  hidden  node.  In  principle,  the  input  data 
vector  may  be  an  n-dimensiona]  complex- valued  vector  with  the  nonlinearity  defined  to 
map  complex  input  to  real-valued  output.  However,  in  this  paper,  we  shall  consider  only 
input  data  vectors  which  are  real-valued. 
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The  hidden  layer  is  fully  connected  to  a  set  of  n '  output  nodes  corresponding  to  the 
components  of  an  n1  dimensional  output  space.  The  strength  of  the  connection  from  the 
j-th  hidden  node  to  the  zfc-th  output  node  is  denoted  A ^  and  thus  the  value  received  at 
the  k- th  output  node  is  a  weighted  sum  of  the  output  values  from  all  of  the  hidden  nodes, 

In  general  the  output  from  the  i-th  output  node  will  be  a  nonlinear  function  of  its 
input,  Oi  =  $k(\ok  +  where  X0k  is  a  ‘bias’  associated  with  that  output  node. 

Thus  the  networks,  provide  a  transformation  mapping  from  tin  n-dimensional  input 
space  to  an  n'-dimensional  output  space  via  an  intermediate  characterisation  space.  This 
mapping  is  totally  defined  by  the  topology  of  the  network  (in  particular,  how  many  hidden 
units  are  employed)  once  all  the  nonlinear  transfer  functions  are  specified  and  the  set  of 
weights  and  biases  {A,  p}  have  been  determined.  This  set  of  weights  and  biases  is  found  by 
a  ‘training’  procedure. 

Networks  performing  a  transformation  from  an  n-dimensional  input  space  to  an  n'- 
dimensional  output  space  using  more  than  one  intermediate  hidden  layer  have  been  con¬ 
sidered  by  some  workers  [19],  but  we  shall  restrict  our  attention  in  this  paper  to  networks 
with  a  single  hidden  layer. 

The  network  will  operate  once  a  set  of  weight  values  {1,4.  p,,}  has  been  determined. 
This  set  is  conditional  upon  training  data  presented  in  the  form  of  representative  input  and 
corresponding  target  output  patterns.  The  set  of  parameters  {1,4,  //„}  is  chosen  so  that 
the  actual  outputs  of  the  network,  {o*’,p  =  1,2,. .  .P},  for  a  given  set  of  inputs,  {xp,p  = 
1,2, ...  P},  are  ‘close’  in  some  sense  to  the  desired  target  values,  {f,p  =  1,2 _ P}.  Usu¬ 

ally.  this  error  criterion  is  a  sum-of-squares  error  of  the  form 

P 

E  *P  °f'  2  (22) 

j>=i 


where  the  summation  runs  over  all  the  patterns  in  the  training  set.  Using  the  Euclidean 
distance  function  and  expressing  the  outputs  in  terms  of  the  set  of  weights  and  biases  and 
the  inputs,  the  erroT  may  be  written  explicitly  as  a  function  of  the  set  {>,4.  41,,}.  For 
instance,  in  the  case  of  the  standard  multi-layer  perceptron,  this  error  may  be  expressed  as 

E  =  £  £  ( ^  -  *■ * (*os  +  £  +  £  *>.,]!,*)  }  (23) 

r=i*=i  {  3=1  i=i  ) 


If  the  training  data  is  not  representative  of  the  test  data  and  we  wish  to  derive  an 
approximation  to  the  mapping  from  R"  to  R"’  defined  by  the  training  data  for  which  the 
sum-squared  error  in  operation  is  a  minimum,  then  the  error  function  used  during  training 
must  be  modified  to  take  account  of  the  discrepancies  between  training  and  operating 
conditions  [26], 

The  expressions  for  the  error,  ((22)  or  the  form  (15),  modified  to  take  account  of  expected 
noise  on  the  data)  are  differentiable  nonlinear  functions  of  the  parameters  and  the  aim  of 
any  training  procedure  is  to  find  a  minimum  of  this  function.  Therefore,  some  strategy  for 
nonlinear  function  minimisation  must  be  employed.  Of  course,  a  global  minimum  cannot 
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be  guaranteed.  Nevertheless,  it  may  be  possible  to  obtain  a  good  local  minimum.  Also,  not 
only  do  we  require  a  good  solution,  but  it  must  be  obtained  ‘within  a  reasonable  timescale'. 
Schemes  which  find  a  good  solution  a  small  percentage  of  the  time,  but  are  very  fast,  may 
be  preferable  to  one  which  finds  a  good  solution  on  most  occasions,  but  takes  a  long  time 
to  do  so.  Optimisation  strategies  for  nonlinear  functions  have  been  discussed  in  previous 
papers  [30,  28].  These  were  applied  to  the  training  of  adaptive  feed-forward  networks  and 
various  example  problems  considered.  For  the  problem  of  point  source  location  using  a 
4x4  focal-plane  array,  the  best  solution  (in  terms  of  the  smallest  mean  error  on  test)  was 
obtained  using  the  Broyden-Fletcher-Goldfarb-Shanno  (BFGS) optimisation  scheme.  This 
is  the  method  which  we  shall  use  in  Section  5.3. 

Testing  the  network  consists  of  applying  the  trained  network  to  patterns  not  previously 
used  as  part  of  the  training  set  and  comparing  the  outputs  with  the  labels  corresponding  to 
those  patterns.  It  is  not  sufficient  to  consider  how  closely  the  network  models  the  training 
set  alone  since,  if  it  models  the  training  set  too  well,  the  network  may  not  have  captured 
the  underlying  structure  of  the  data  and  be  unable  to  generalise  to  unseen  data. 


5  Feed-forward  Network  Estimation  of  Source  Position 


In  this  section,  we  consider  the  application  of  feed-forward  adaptive  networks  to  point  source 
location  using  focal-plane  arrays.  The  method  may  be  applied  to  any  array  of  sensors  where 
the  image  response  function  may  be  characterised  by  an  anay  manifold.  However,  in  order 
to  be  specific,  we  have  confined  our  study  to  the  focal-plane  situation  and  one  idealised  array- 
in  particular,  namely  a  5  x  1  array  of  elements,  each  with  a  sin(z)/z  shape  point-spread 
function  (Equation  3).  Thus,  the  array  manifold  consists  of  a  set  of  real-valued  vectors. 
In  each  example,  the  distance  between  adjacent  elements  in  the  focal-plane  is  unity,  giving 
samples  of  the  image  at  Nyquist  rate.  Figure  2  illustrates  the  response  of  each  receiving 
element  to  a  point  source  in  the  far  field  for  the  linear  array.  The  distance  between  the 
peak  of  a  response  and  the  first  null  is  termed  the  “beamwidth"  and  is  equal  to  unity  for 
these  examples. 

Section  5.1  describes  the  data  used  for  training  and  testing  the  network.  Sections  5.2  and 
5.3  describe  feed-forward  network  estimators  of  position.  Section  5.4  assesses  a  maximum 
likelihood  approach.  This  provides  a  reference  by  which  to  judge  the  feed-forward  network 
technique.  Section  5.5  gives  results  for  the  bias  in  the  estimate  of  the  position  of  a  source 
as  a  function  of  position  for  both  the  linear  and  square  arrays,  and  compares  the  results 
wi‘h  the  maximum  likelihood  method. 


5.1  Generation  of  Data 

Training  and  test  sets  have  been  generated  for  the  linear  array,  with  each  set  consisting  of  a 
set  of  images  of  single  point  sources  of  unit  amplitude  ( used  as  input  to  a  network ),  together 
with  the  source  positions  (taken  to  be  the  target  data).  Thus,  the  input  dimension  of  the 
network,  n  is  taken  equal  to  the  dimension  of  the  receiver  array,  N.  For  all  experiments, 
the  size  of  the  linear  array  was  fixed  to  contain  5  receivers  at  Nyquist  spacing  in  the  focal- 
plane.  Thus,  the  set  of  input  data,  {i*}  is  a  set  of  representative  images.  {fc(0r)},  with 
the  corresponding  targets,  f  being  the  positions  0r. 
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Beamshapes  for  a  linear  array 


Figure  2:  Response  of  each  receiving  element  to  a  point  source  in  the  far-field  for  a  5  x  1 
linear  array  of  receivers  in  the  focal-plane  of  an  imaging  system. 

For  the  linear  array  considered,  the  images  of  a  single  source  are  calculated  using  Equa¬ 
tion  3  at  101  different  positions,  equally  spaced  across  the  field  of  view  of  the  array  from 
-2.5  to  2.5  (at  a  spacing  of  i).  For  the  test  data,  the  images  of  a  single  source  at  200 
positions  chosen  randomly  between  -2.0  and  2.0  are  taken  as  input  with  the  source  position 
as  target. 

The  focal-plane  array  illustration  described  is  highly  idealised.  In  general,  the  array 
manifold,  and  the  image  vectors,  would  be  complex  vectors  due  to  the  phase  of  the  source 
and  the  relative  phase  between  receivers  being  a  function  of  source  position  and  therefore 
some  method  of  incorporating  complex  vectors  into  a  feed  forward  network  would  have  to 
be  considered.  This  is  not  a  difficult  task,  but  for  our  purposes  we  shall  restrict  the  example 
to  considering  real  vector  inputs  only. 


5.2  Radial  Basis  Function  Approximation 

We  now  derive,  using  the  training  data,  an  estimate  of  source  position  which  is  a  nonlinear 
function  of  the  measured  image  vector,  x.  A  naive  application  of  Equation  (10)  (with  the 
a,  taken  to  be  the  data  points  and  the  /,  the  target  points)  is  inappropriate  for  the  point 
source  location  problem.  This  is  because,  for  a  single  point  source  in  the  scene,  the  measured 
image,  x,  is  not  simply  a  point  z  on  the  array  manifold  corrupted  by  noise,  but  it  is  a  scaled 
version  of  a  point  on  the  array  manifold  corrupted  by  noise,  t'.e. 

x  =  At  +  n  (24) 

where  A  is  the  amplitude  of  the  source.  Thus, 


p(x|z,A)  =  p„(x  -  At), 


(25) 
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where  p„  is  the  noise  probability  density  function. 

The  solution  for  g(z)  which  minimises  the  variance  now  involves  the  prior  probability 
density  function  of  the  amplitude,  A, 


,  .  _  fff(x)pn(z  -  Ax)p(A)p(x)dxdA 
Jfp«{*  ~  Ax)p(A)p(x)dxdA 


(26) 


For  a  Gaussian  noise  process,  with  diagonal  covariance  matrix  with  equal  diagonal  ele¬ 
ments,  <7J, 


M») 


^^-i^expl-^lnl2] 


(27) 


and  assuming  that  p(A)  is  uniformly  distributed,  then  integrating  (over  (0,  oc))  with  respect 
to  A  gives 


S  /(*)»(*,  *)p(*)dx 
J  s(x,z)p(x)dx 


(28) 


where 


s(x.z)  =  exp{-5i5x-(/-«i*)x}i7(l±erf(7Lf|2-il)) 


(29) 


where  the  -t-  sign  is  taken  if  z'x  >  0.  and  the  minus  sign  if  z‘x  <  0. 


Approximating  the  integrals  with  respect  to  x  by  a  summation  over  the  training  set 
(this  implicitly  assumes  that  all  angles  are  equally  likely  since  the  training  data  is  sampled 
uniformly  in  angle  space) 


9(*) 


LA:  i  *(**•*) 


(30) 


This  approximation  is  valid  provided  that  the  function  j(i. z)  is  sampled  on  a  scale  which 
is  small  compared  to  the  standard  deviation,  a:  that  is  we  require 


(31) 


5.3  Multilayer  Perceptron  for  High  Signal-to-noise  Ratio 

In  a  high  signal-to-noise  ratio  situation,  the  approximation  given  by  Equation  (30)  becomes 
increasingly  invalid.  Therefore,  we  choose  to  approximate  the  function  by  a  particular 
transformation  and  determine  the  parameters  by  some  appropriate  minimisation  procedure. 
The  particular  functional  form  we  have  chosen  is  a  feed-forward  network  having  a  single 
hidden  layer  with  input  in  the  form  of  a  hyperplane  and  a  nonlinear  transfer  function 
4(z)  =  1/(1  +  «”•).  In  fact,  it  may  be  shown  that  multilayer  feed-forward  networks  with 
a  single  hidden  layer  are  universal  approximators  in  that  an  arbitrary  function  can  be 
approximated  arbitrarily  well  (10.  24].  However,  in  a  practical  problem,  the  mapping  we 
wish  to  approximate  is  not  known  continuously  but  it  is  usually  defined  by  a  finite  set  of 
points  in  R*  g  R"  defined  by  a  training  set.  The  output  nodes  are  taken  to  be  linear 
functions.  ♦(*)  =  z. 
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As  in  the  radial  basis  function  example  above,  we  wish  to  define  an  approximant  which 
not  only  generalises  to  noise  data  not  in  the  training  set,  but  also  is  relatively  insensitive  to 
source  amplitude.  Therefore  we  choose  to  normalise  the  data  vectors  to  be  of  unit  magnitude 
on  input  to  the  network.  This  removes,  at  least  in  a  high  signal-to-noise  situation,  the  effect 
of  fluctuations  of  image  vector  magnitude  due  to  source  amplitude  fluctuations.  Thus  the 
network  used  is  that  depicted  in  Figure  3  :  an  input  normalisation  layer,  a  hidden  layer  and 


Input  layer  Normalisation  layer  Hidden  layer  Output  layer 


Figure  3:  A  feed  forward  adaptive  layered  network  with  an  input  normalisation  layer. 


a  linear  output  layer. 


For  a  multilayer  perceptron  with  a  single  hidden  layer  and  the  sum  square  error  criterion, 
the  regularisation  term.  Equation  (16).  may  be  written  in  terms  of  the  weights  using  the 
results  that 


(33) 

where  ft'  is  the  output  of  the  ;th  hidden  node  for  input  pattern  and  r'  represents  the 
jth  component  of  the  normalisation  layer.  The  scalar  quantities  A,,  and  are  the  weights 
between  the  ith  output  node  and  the  ;th  hidden  node,  and  between  the  ;th  hidden  node 
and  the  Fth  input  node  respectively. 
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For  a  given  value  of  n0  and  a  given  set  of  training  data,  the  network  was  trained  to 
minimise  the  sum-squared  error  using  the  procedure  described  in  Section  4.  For  the  source 
location  problem,  the  error  between  the  outputs  of  the  network  and  the  targets  which  is 
minimised  has  a  physical  interpretation:  it  is  the  sum  of  the  square  of  the  error  in  position 
estimation.  Initially,  the  values  of  the  network  weights  were  chosen  randomly  from  a  uniform 
distribution  on  (-1.0, 1.0).  Then  the  BFGS  nonlinear  optimisation  strategy  was  used  to  find 
the  solution  for  the  weights  for  which  the  mean  square  error  at  the  output  of  the  network  is 
a  minimum.  The  network  was  tested  using  the  test  data  generated  and  the  normalised  error 
on  test  calculated.  The  experiment  was  run  for  100  different  random  start  configurations 
for  the  weights.  The  solution  for  the  weights  which  gave  the  lowest  normalised  error  on  on 
the  training  set  over  the  100  experiments  was  chosen  as  the  one  which  best  describes  the 
mapping  from  image  space  to  position  space  for  the  particular  network  under  consideration. 
This  solution  is  the  one  used  in  the  analysis  of  the  performance  of  the  network  in  Section 


5.4  Maximum  Likelihood  Solution 


Before  we  give  results  for  the  radial  basis  function  and  the  multilayer  perceptron  network 
estimators  of  position,  we  consider  a  maximum  likelihood  approach  to  position  estimation. 
It  is  shown  in  Appendix  A  that  the  maximum  likelihood  estimate  of  position  is  that  value 
of  6  for  which  the  quadratic  form,  Q.  given  by 


j  h(0)’N~'In.> 


(34) 


is  a  maximum.  In  principal,  the  maximum  of  Q  may  be  found  using  some  nonlinear  optimi¬ 
sation  strategy.  However,  since  in  general  we  do  not  know  the  function,  h(&)  continuously, 
but  only  at  a  finite  set  of  points  determined  by  a  calibration  procedure  and  given  as  the 
training  set,  then  the  value  of  the  quadratic  form  can  only  be  evaluated  at  these  positions. 
For  the  training  set  considered  in  this  paper,  these  data  points  are  equally  spaced  in  posi¬ 
tion.  One  estimate  of  source  position  would  be  to  take  the  position  at  which  Q  is  greatest. 
Tliis  would  give  an  estimate  of  position  to  an  accuracy  determined  by  the  sample  spacing. 
A  more  accurate  estimate  of  position  would  be  to  interpolate  the  sample  values  and  adopt 
the  position  of  the  peak  of  the  interpolating  function  as  the  estimate  of  position.  This  was 
the  procedure  adopted  in  [27j. 

A  maximum  likelihood  method  has  been  implemented  for  the  5  x  1  array,  with  a  data 
set  consisting  of  a  set  of  images  of  sources  at  equally-spaced  positions.  The  set  of  data 
vectors  contains  101  images  of  dimension  5,  together  with  associated  positions,  equally 
spaced  across  the  field  of  view  from  -2.5  to  2.5  (i.e.  a  spacing  of  ^).  The  procedure  for 
determining  the  maximum  likelihood  estimate  of  position  given  the  image  of  a  single  source 
corrupted  by  noise  is 


1.  Calculate  the  value  of  the  quadratic  form  (34)  at  each  point  of  the  training  set. 

2.  Find  the  peak  value. 

3.  Fit  a  quadratic  function  to  the  3  data  points  centred  on  the  peak  position. 

4.  Select  the  position  of  the  source  as  the  peak  of  the  interpolating  quadratic  function. 
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0.10 
0.08 
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Figure  4:  The  root  of  the  total  square  error  as  a  function  of  position  for  the  linear  array  and 
the  maximum  likelihood  method  for  a  value  of  o1  of  10~2  (upper  curves)  and  10~3  (lower 
curves) 


Figure  4  plot  the  root  mean  square  error  in  position  as  a  function  of  position  for  the 
maximum  likelihood  method  for  values  of  o1  of  10~2  and  10~3.  The  solid  lines  are  the 
analytic  approximations  derived  in  Appendix  A. 


where  0g  is  the  beamwidth  (unity  in  this  example)  and  is  a  function  of  8.  The  dashed 
lines  are  the  result  of  Monte-Carlo  simulations  based  on  5000  images  at  each  position.  The 
estimate  of  position  was  made  using  the  method  described  above. 

For  both  values  of  signal-to-noise  ratio,  there  is  very  good  agreement  between  the  re¬ 
sults  obtained  using  the  Monte-Carlo  simulation  and  the  high  signal-to-noise  theoretical 
predictions.  At  lower  signal-to-noise  ratios,  we  would  expect  deviation  between  the  simula¬ 
tion  and  the  theory  to  increase  since  the  analytic  approximation  for  the  error  derived  in  the 
appendix  was  derived  for  a  high  signal-to-noise  ration  regime.  Also,  at  very  high  signal- 
to-noise  ratios,  there  would  be  deviation  between  theory  and  experiment.  This  is  because 
there  is  a  limit  on  the  error  (even  in  the  absence  of  noise)  imposed  by  the  approximate 
nature  of  the  maximum  likelihood  solution,  which  is  based  on  a  finite  number  of  samples 
of  the  point-spread  function  and  a  quadratic  interpolation  to  the  quadratic  form,  Q.  We 
have  found  that  the  biass  introduced  by  sampling  the  point-spread  function  and  quadratic 
interpolation  is  less  than  7.0  x  10~®  over  the  central  region  of  the  field  of  view.  This  is  much 
smaller  than  the  noise  errors  for  the  values  of  signal-to-noise  ratio  used  in  the  illustrations 
and  only  becomes  similar  to  the  noise  error  at  signal-to-noise  ratios  of  about  10®. 


5.5  Feed-forward  Network  Results 

Figures  5  and  6  plot  the  variance  in  the  radial  basis  function  approximator  for  several  values 
of  <rJ.  again  obtained  using  a  Monte-Carlo  simulation.  Figure  5  plots  the  variance  over  the 
field  of  view  for  the  same  values  of  <rJ  used  to  illustrate  the  maximum  likelihood  estimator. 
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Figure  5:  The  root  of  the  total  square  error  as  a  function  of  position  for  the  linear  array 
and  a  Radial  Basis  Function  network  trained  with  <r2  =  10'2  (solid  line)  and  o2  =  10'3 
(dashed  line) 


The  performance  is  very  similar.  Figure  6  is  a  zoomed-in  plot  of  the  centra]  region  of  the 
field  of  view  for  three  values  of  o1.  It  shows  that  for  cr2  =  10"4,  the  network  is  unable 
to  interpolate  between  points  in  the  training  set  -  hence  the  saw-tooth  effect  of  the  error. 
It  peaks  at  a  value  of  0.025  (which  is  half  of  the  sample  spacing  in  the  data  set).  The 
reason  for  this  failure  is  that  the  approximation  given  by  Equation  (30)  is  not  valid  since 
the  varaince  in  the  noise  is  smaller  than  the  sample  spacing  of  data  vectors. 

Therefore,  in  order  to  achieve  estimates  of  position  more  accurate  than  that  permitted 
by  the  spacing  of  points  in  the  training  set.  a  parametric  form  must  be  adopted  for  the 
approximating  function.  The  parameters  of  this  function  may  then  be  obtained  by  a  suitable 
optimisation  strategy  which  minimises  an  error  between  the  approximation  and  the  desired 
values. 

Multilayer  perceptron  results  are  given  in  Figures  7-13.  Several  multilayer  pcrceptron 
networks,  each  with  a  single  hidden  layer  with  a  different  number  of  hidden  units,  were 
trained  and  the  normalised  error1  on  the  train  and  test  sets  calculated.  In  the  first  instance, 
the  networks  were  trained  to  minimise  the  sum-squared  error  between  the  actual  output 
of  the  network  and  the  desired  output  for  the  training  set.  Figure  7  plots  the  normalised 
errors  as  a  function  of  the  number  of  hidden  units.  A  normalised  error  of  10" 2  on  the  test 
set  corresponds  to  a  root  mean  sum-squared  error  in  position  of  1.15  x  10"  2  of  a  beamwidth 
and  a  normalised  error  of  10"4  on  the  test  set  corresponds  to  a  root  mean  sum-squared 
error  in  position  of  1.15  x  10-4.  The  figure  shows  that  the  training  error  is  a  monotonic 
decreasing  function  of  the  number  of  hidden  units,  whilst  the  test  error  decreases  up  to 
5  hidden  units  and  then  begins  to  fluctuate.  Therefore,  we  have  selected  a  network  with 
5  hidden  units  to  illustrate  the  results.  Figure  8  plots  the  bias  in  position  (the  difference 
between  the  actual  position  and  the  position  predicted  using  the  network)  as  a  function 
of  actual  source  position  for  a  trained  network  with  5  hidden  units  over  the  test  interval. 
[  —  2.0. 2.0] .  The  normalised  error  on  the  test  set  is  2.09  x  10~4  and  corresponds  to  a  root  of 
the  mean  sum-squared  error  in  position  of  2.4  x  10~4  of  a  beamwidth  and  the  peak  value 

1  The  normalised  error  is  the  square  root  of  the  ratio  of  the  mean  sum-squared  error  to  the  variance  in 
the  target  values  [30] 


18 


Point-source  Location 


position 

Figure  6:  The  root  of  the  total  square  error  as  a  function  of  position  for  the  linear  array  and 
a  Radial  Basis  Function  network  trained  with  <rJ  =  10~2  (solid  line),  o'2  =  10  3  (dashed 
line)  and  <rJ  =  10_<  (dotted  line) 


normalised  errors 


0.0  - 
-i.o  -j 
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Figure  7:  Log10(normalised  error)  on  the  training  set  (solid  line)  and  the  test  set  (dashed 
line)  for  the  linear  air  ay. 
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Linear  anay  errors 
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Figure  8:  Bias  (xlOOO)  as  a  function  of  position  for  a  linear  array  and  a  network  with  5 
hidden  units,  trained  on  101  patterns. 

of  the  bias  error  is  9.0  x  10_<  of  a  beamwidth.  These  errors  are  very  small  and  therefore, 
from  the  experiments  with  the  linear  data,  we  conclude  that  it  is  possible  to  achieve  a  very- 
accurate  nonlinear  mapping  from  the  image  vector  to  position.  However,  there  will  be  errors 
in  the  position  estimate  due  to  noise  on  the  inputs.  Figure  9  plots  the  standard  deviation  in 
the  position  estimate.  {{6-6 0)2)5 ,  obtained  using  a  Monte-Carlo  simulation  for  noise  on  the 
inputs  of  value  <r3  =  10~3.  It  is  immediately  apparent  that  this  is  significantly  greater  than 
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Figure  9:  The  root  of  the  total  square  error  as  a  function  of  position  for  the  linear  array 
and  a  network  with  5  hidden  units  for  noise  on  the  inputs  with  value  o'1  =  10“ 3 

the  maximum  likelihood  method  or  the  radial  basis  function  approximation.  The  reason  for 
this  is  that  the  network  has  been  trained  to  minimise  the  sum-squared  error  on  a  training 
set  which  is  not  representative  of  the  data  used  to  test  the  network  (i.e.  the  training  set  is 
noiseless)  and  has  not  been  trained  for  the  operating  conditions  of  noisy  images. 

The  effects  of  training  on  noisy  vectors  (t.e.  data  representative  of  the  expected  operating 
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conditions)  may  be  simulated  by  training  on  the  noiseless  data  but  modifying  the  error 
criterion  used  for  training.  In  the  final  experiments  illustrated  here,  a  multilayer  perceptron 
was  trained  to  minimise  the  augmented  error  given  by  Equation  (15).  A  value  of  10~3  was 
taken  for  <t2  and  the  results  are  given  in  Figures  10  and  11.  Figure  10  plots  the  bias  as 

1.0  - 
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Figure  10:  Bias  (xlO)  as  a  function  of  position  fcr  a  linear  array  end  a  network  with  5 
hidden  units,  trained  on  101  patterns,  and  with  a  value  of  <r3  of  10~3. 
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Figure  11:  The  root  of  the  total  square  error  as  a  function  of  position  for  the  linear  array 
and  a  network  with  5  hidden  units 

a  function  of  position.  This  is  considerably  greater  than  that  shown  in  Figure  8.  but  the 
root  of  the  total  error  (calculated  using  a  Monte-Carlo  simulation  with  input  noise  of  10~3 
and  given  in  Figure  11)  is  reduced  compared  to  Figure  9.  Thus,  it  is  possible  to  reduce 
the  total  squared  em>r  in  position  for  a  multilayer  perceptron  operating  on  noisy  data  by 
taking  into  account  the  expected  operating  conditions  during  the  training  procedure.  The 
errors  are  still  not  so  small  as  the  errors  given  by  the  maximum  likelihood  method  or  the 
radial  basis  function  network,  but  it  can  be  reduced  further  by  the  addition  of  more  hidden 
units.  A  multilayer  perceptron  with  25  hidden  units  reduces  the  total  square  error  (averaged 
across  the  test  set)  from  1.78  x  10~3  obtained  for  5  hidden  units  to  5.04  x  10~*.  Results 
for  25  hidden  units  are  given  in  Figures  12  and  13.  Also,  since  the  multilayer  perceptron 
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Figure  12:  Bias  (xlO)  as  a  function  of  position  for  a  linear  array  and  a  network  with  25 
hidden  units,  trained  on  101  patterns,  and  with  a  value  of  cr2  of  10~3. 


position 

Figure  13:  The  root  of  the  total  square  error  as  a  function  of  position  for  the  linear  array 
and  a  network  with  25  hidden  units 


is  making  a  global  fit  to  the  data,  it  is  not  so  sensitive  to  the  sample  spacing  as  the  radial 
basis  function  network.  Of  course,  increasing  the  number  of  hidden  units  (and  hence  the 
number  of  free  parameters  to  adjust)  may  lead  to  overfitting  of  the  data. 

We  conclude  this  section  with  a  short  discussion  of  the  three  methods  which  we  have 
considered  in  this  paper.  The  main  points  are  summarised  in  Table  1.  Firstly,  both  the 
maximum  likelihood  method  and  the  radial  basis  function  network  require  storage  of  the 
point  -spread  function;  that  is,  all  the  training  data  is  required  for  implementation  of  the 
methods.  For  the  problem  considered  in  this  paper,  or  indeed  even  for  the  two-dimensional 
an-ay.  the  amount  of  data  is  not  excessive.  However,  this  may  not  be  so  in  problems  where 
the  input  and  output  dimensions  are  large.  The  multilayer  perceptron,  on  the  other  hand, 
parameterises  the  point-spread  function  in  the  weights  of  the  of  the  network2. 


1  Of  course,  a  radial  basis  function  network  could  be  constructed  with  a  reduced  number  of  centres  [3] 
since  it  is  not  necessary  to  have  at  centre  at  every  data  point.  However,  in  these  comparisons,  we  consider 
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METHOD  1 

property 

RBF 

MLP 

mlp  ; 

Storage 

requirements 

all  training 
data 

weights 

all  training 
data 

global  or  local 
method 

local 

global 

local 

SNR  regime 

low  SNR 

high  SNR 

any  value 

SNR 

dependence 

requires 

<72 

requires 

a2 

does  not 
require  <r2 

Table  1:  Summary  of  properties  of  the  methods  discussed  in  this  paper. 


The  multilayer  perceptron  performs  a  global  fit  to  the  training  data.  The  maximum 
likelihood  method  is  a  local  method  in  that,  once  the  position  corresponding  to  the  peak 
of  the  quadratic  form  is  determined,  only  local  points  are  used  to  obtain  a  more  accurate 
estimate  of  position.  The  radial  basis  function  network  uses  all  data  points  to  estimate  the 
position  of  the  source,  but  the  contribution  from  those  which  are  distant  from  the  input 
vector  is  minimal  so  that  it  is  effectively  a  local  method. 

Both  the  multilayer  perceptron  and  the  radial  basis  function  network  require  knowledge 
of  the  signal-to-noise  ratio.  A  different  value  of  <r2  requires  a  different  network.  Thus,  a 
network  must  be  constructed  for  each  different  signal-to-noise  ratio  regime  or  some  means 
of  adapting  the  weights  of  the  multilayer  perceptron  or  the  nonlinear  functions  in  the  radial 
basis  function  network  must  be  employed.  The  maximum  likelihood  method  does  not  require 
a  knowledge  of  <72.  For  a  given  image  vector,  the  position  of  the  maximum  of  the  quadratic 
form  Q  (see  Equation  (34))  is  independent  of  <72. 

The  particular  radial  basis  function  network  aproximation  derived  in  this  paper  is  valid 
for  low  signal-to-noise  ratios.  The  multilayer  perceptron  has  been  derived  for  high  signal- 
to-noise  ratios,  but  could  be  extended  to  lower  signal-to-noise  ratios  by  including  higher 
order  terms  in  the  expansion  of  the  error.  The  maximum  likelihood  method  is  appropriate 
for  any  value  of  <72,  though  the  analytic  expressions  for  the  bias  and  variance  in  the  estimate, 
derived  in  the  appendix,  are  valid  for  a  high  signal-to-noise  approximation. 


6  Discussion. 


This  paper  has  considered  a  functional  approximation  approach  to  point -source  location 
using  an  array  of  sensors.  Specifically,  the  array  of  sensors  considered  was  a  focal-plane 
array  radar  and  the  position  of  a  single  source  in  the  scene  giving  rise  to  a  measured  image 
was  regarded  as  a  nonlinear  function  of  that  image.  The  problem  then  is,  given  some  training 
data  comprising  representative  images  of  point  sources  and  their  associated  positions,  define 
a  mapping  from  image  space  to  position  which  is  robust  to  noise  on  the  image.  A  minimum 


the  particular  radial  basis  function  network  which  arises  as  a  consequence  of  approximating  the  integral  (28) 
by  a  finite  sum  over  the  training  set. 
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mean  square  error  approach  was  adopted  since  this  gives  an  approximant  which  is  the 
expected  value  of  the  position  for  a  given  image. 

The  approximant  which  gives  the  expected  value  of  the  a  posteriori  density  may  be 
expressed  as  a  sum  over  the  training  set  giving  the  form  of  a  radial  basis  function  network. 
This  is  valid  in  the  situation  where  the  noise  variance  is  large  compared  to  the  sample  spacing 
of  data  points  in  the  training  set.  In  the  other  extreme  of  small  noise,  we  must  assume  some 
parametric  form  for  the  approximant  and  we  adopted  a  multilayer  perceptron  architecture. 
We  evaluated  the  performance  of  both  of  these  feed-forward  network  architectures  and 
compared  them  with  a  maximum  likelihood  approach. 

For  a  low  signal-to-noise  ratio,  the  errors  in  the  position  estimate  for  the  radial  ba¬ 
sis  function  network  and  the  maximum  likelihood  approach  are  very  similar.  At  higher 
signal-to-noise  ratios,  the  radial  basis  function  approximation  becomes  increasingly  invalid 
and  a  prescribed  parametric  form  (the  multilayer  perceptron)  was  used.  This  was  trained 
using  an  augmented  error  criterion  to  simulate  the  effects  of  noise  on  the  expected  data  ‘in 
operation’.  A  MLP  with  5  hidden  units  did  not  perform  so  well  as  the  maximum  likelihood 
method.  Increasing  the  number  of  hidden  units  to  25  improved  the  performance,  but  the 
maximum  likelihood  method  was  still  superior.  A  further  advantage  of  the  maximum  like¬ 
lihood  method  is  that  it  does  not  depend  on  the  noise  power,  <r2,  whereas  the  RBF  and  the 
MLP  approximants  are  functions  of  <r2. 

One  advantage  of  exploring  the  MLP  architecture  is  that  it  is  general  purpose  and  there 
is  the  potential  for  implementation  on  the  focal  plane  of  the  array,  which  may  give  significant 
data  reduction  on  the  array  and  which  may  be  very  important  in  some  applications.  The 
approach  of  regarding  the  position  of  a  point  source  as  a  nonlinear  function  of  the  image 
also  has  application  to  staring  array  sensors  other  than  radar  in  which  it  is  required  to 
obtain  sub-pixel  accuracy  of  a  source  (eg  [5]).  Obviously,  the  work  can  be  extended  to 
two  dimensional  arrays  (see  [27]  for  the  maximum  likelihood  method  applied  to  square 
and  hexagonal  two-dimensional  arrays)  but  the  study  in  this  paper  was  restricted  to  one 
dimension  for  illustration  purposes. 

There  are  several  possible  avenues  for  further  work.  Improved  performance  may  be 
obtained  for  the  MLP  if  the  nonlinear  functions  at  the  hidden  nodes  were  better  matched 
to  this  particular  problem.  Also,  it  may  be  appropriate  for  some  applications  if  the  estimate 
of  the  position  were  unbiased  so  that  integration  may  take  place  after  position  estimation. 
Application  to  real  radar  data  will  require  some  modification  to  the  MLP  architecture,  since 
the  data  vectors  will  be  complex,  and  the  MLP  must  be  designed  so  that  it  is  insensitive 
to  an  arbitrary  phase  associated  with  the  target.  This  is  not  a  difficult  problem.  Further, 
can  the  functional  approximation  approach  be  applied  to  multi-source  scenes?  A  direct 
implementation  of  the  method  would  lead  to  a  vast  amount  of  training  data  covering  all 
possible  positions  and  relative  amplitudes  of  sources.  Therefore,  some  other  architecture 
may  be  more  appropriate  (see  [9, 12]  for  an  approach  based  on  Hopfield  networks).  However, 
the  single  source  assumption  is  valid  where  range  and  doppler  processing  can  be  employed 
to  discriminate  between  sources  and,  after  all,  is  the  assumption  which  monopulse  radar 
makes. 

In  conclusion,  a  novel  approach  to  point-source  location  based  on  function  approxima¬ 
tion  has  been  presented  and  compared  with  a  more  traditional  solution.  The  potential  for 
implementation  of  the  method  on  the  focal-plane  of  the  sensor  could  be  significant. 
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Appendix  A 


Maximum  Likelihood  Solution 


The  conditional  probability  of  an  observation,  J„  6  Rm,  given  the  position,  0,  and  ampli¬ 
tude,  A,  of  a  source  for  a  Gaussian  noise  process  is  given  by 

P(J"I*’'4)  =  ;; ;  77  Ah(0))-N-\in-  Ah(0))) 

(2t)"/j|7V|i  V  2  / 

where  N ,  is  the  m  x  m  positive  semi-definite  covariance  matrix  of  additive  noise. 

Thus,  the  log  likelihood,  Jop(p{ I„|0,  A))  is  given  by 

log(p(  In\0,A))  =  -^lcS(2tr)-i/oS(|2VI)-i(7„-J<)rjV-1(/„-/tf), 
where 

h  =  Ah(0), 

is  the  image  of  a  source  of  amplitude  A  at  position  0  in  the  absence  of  noise. 

Since  the  first  two  terms  in  Equation  (37)are  independent  of  the  parameters  A  and  0, 
the  maximum  of  the  likelihood  function  occurs  when  the  quadratic  form 

(J„  -  I6)-N-'(I„  -  le ),  (39) 

is  a  minimum. 

Differentiating  the  above  expression  with  respect  to  the  parameter  A  and  equating  to 
zero  gives  the  maximum  likelihood  solution  for  A,  expressed  in  terms  of  h{0 )  as 


(36) 

(37) 

(38) 


h’(0)N~lI„ 

h'(9)N-'h{0) 


(40) 


Substituting  for  .4  into  the  expression  (39)  and  simplifying  the  algebra,  we  find  that  the 
expression  is  now  a  function  of  0  alone  (through  h(@))  and  that  a  minimum  occurs  when 
the  quantity  £,  given  by- 


El  = 


rnN'i  „ 


]  h'N-'I„\2 
h'N-'h 


(41) 


is  a  minimum.  Since  the  first  term  is  independent  of  0,  the  maximum  likelihood  solution 
for  0  occurs  when  the  second  quantity  (including  the  minus  sign)  is  a  minimum,  i.e.  when 


\h'N-'ln\* 

h’N-'h 


(42) 


is  a  maximum. 

For  TV  -  o‘l.  the  quadratic  form  reduces  to 


11^!  .  V'-,* 

o' 


o'  h’h 


(43) 
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where  h  is  the  normalised  point-spread  function  vector  (normalised  to  unit  magnitude). 
This  is  a  function  of  6  alone,  and  the  maximum  likelihood  estimate  of  position  is  the  value 
of  6  at  which  the  above  quantity  attains  its  maximum.  In  order  to  determine  this  value 
some  means  of  nonlinear  optimisation  must  be  employed  since,  in  general,  it  may  not  be 
possible  to  write  down  a  solution  in  closed  form. 

However,  we  can  obtain  expressions  for  the  bias  and  the  variance  in  the  maximum 
likelihood  estimator  (at  least  in  a  small  noise  approximation)  from  a  perturbation  expansion 
as  follows.  Differentiating  the  expression  (41)  with  respect  to  6,  and  equating  to  zero  give. 


(h'N-'h)  JJ)  -  (/;*-**)  =  o  (44) 


In  the  absence  of  noise,  the  solution  is  given  by  6  —  ®o-  When  noise  is  present,  let  the 
image  be  given  by 

/„  =  Aoh(8a)  +  n  (45) 

where  Ao  and  00  are  the  true  values  of  amplitude  and  position  and  n  is  the  perturbation 
of  the  image  due  to  noise. 

For  a  small  perturbation  to  the  noiseless  image  given  by  the  noise  vector,  n,  let  the 
solution  for  0  be  0O  Substituting  this  into  Equation  (43)  and  expanding  the  functions 
h{0)  and  dh/dO  using  Taylor's  theorem  leads  to  solutions  for  the  mean  and  the  standard 
deviation3  of  the  estimate  as 


and 


where 


and 


(46) 

(47) 

(48) 

(49) 


Defining  the  signal-to-noise  ratio  to  be  the  total  power  received  by  the  array  of  sensors 
for  a  source  in  a  reference  position  (usually  taken  to  be  the  centre  of  the  field  of  view) 
divided  by  the  noise  power  per  receiver4 


sn*",  =  1 ±s£.h-(0„,)h(0re,)  (50) 

O 

3  The  standard  deviation  may  also  be  obtained  by  using  the  result  that  the  maximum  likelihood  estimate 
is  asymptotically  normally  distributed  with  a  dispersion  matrix  depending  on  the  likelihood  function  [14]. 
Thi*  was  the  approach  considered  in  [27 j 

4 This  gives  a  position  indeper-)<*nt  definition  of  signal-to-noise  ratio 
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then  the  standard  deviation  may  be  written  in  the  form 


<<J>5  = 


©s 


Ki^/snf 

ref 


(51) 


where 

K*  =  {h-(0)h(0)h-(e„,)h(ertt))  &B  (52) 

and  is  the  beamwidth.  This  is  a  form  often  quoted  for  the  tracking  error  in  monopulse 
radar.  It  shows  that  the  standard  deviation  is  inversely  proportional  to  the  square  root  of 
the  signal-to-noise  ratio,  with  the  constant  of  proportionality  being  a  function  of  position. 
Figure  14  plots  the  quantity  K»  as  a  function  of  position. 

K  factor 


0.5  J 


0.0  -k - 1 - T - T - T - I - • - 

-2,0  -1.5  -1.0  -0.5  0.0  0.5  1.0  1.5  2.0 

position 

Figure  14:  The  quantity  K  as  a  function  of  position  for  the  linear  array. 


Similarly,  the  mean  of  the  estimate  may  be  written 


(<#) 


snrre/ 


(53) 


where 

B,  =  ^  ( h^o  -  !*•**)  **(•„/)*(•«/)  (54) 

This  shows  that  the  mean  is  inversely  proportional  to  the  signal-to-noise  ratio.  Figure  15 
plots  the  quantity  Bt  as  a  function  of  position. 

These  results  show  that,  at  high  signal-to-noise  ratio,  the  bias  is  small  compared  to  the 
standard  deviation. 
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Appendix  B  Expected  Error 


Let  the  pattern  xp  be  corrupted  by  additive  noise,  n,  so  that  the  error  for  pattern  xp  is 

Ep  =  E(xp  +  n)  =  £(*,)  +  (n*V)£|  +  (n*V)J£!  (55) 

expanding  by  Taylor’s  theorem  and  assuming  that  n  is  small  so  that  terms  0(|n|3)  may  be 
neglected.  For  (n)  =  0,  the  expected  error  (average  over  all  noise  vectors)  is 


(£„)  =  £(*j>)  +  -{n‘Bpn) 


where  E(xp)  is  the  error  in  the  absence  of  noise  and  \(n‘Bpn)  is  an  additional  error  term 
where  Hr  is  the  Hessian  with  respect  to  the  data  space  components,  evaluated  for  the  pth 
pattern 

„  d2£ 

B«  =  ra;  ■  <57> 

For  ( n,rij )  =  o26tl.  the  additional  error  term  may  be  written 


where  Tr  is  the  matrix  trace  operation.  Averaging  over  all  data  patterns  gives  the  mean 
expected  error 

&T)  =  >(•,)+  5^  f>(Jn.  (59) 

l>=  i  P=l 
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